Characteristics of time series (ts)
Classical decomposition
Characteristics of time series
- Expectation, mean & variance
- Covariance & correlation
- Stationarity
- Autocovariance & autocorrelation
- Correlograms
3 Jan 2024

Discrete time; \(x_t\)
Discrete (eg, total # of fish caught per trawl)
Continuous (eg, salinity, temperature)
Univariate/scalar (eg, total # of fish caught)
Multivariate/vector (eg, # of each spp of fish caught)
Integer (eg, # of fish in 5 min trawl = 2413)
Real (eg, fish mass = 10.2 g)
Univariate \((x_t)\)
Multivariate \(\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}_t\)
Most statistical analyses are concerned with estimating properties of a population from a sample
For example, we use fish caught in a seine to infer the mean size of fish in a lake
Time series analysis, however, presents a different situation:
Time series analysis, however, presents a different situation:
For example, one can’t observe today’s closing price of Microsoft stock more than once
Thus, conventional statistical procedures, based on large sample estimates, are inappropriate
Number of users connected to the internet
Number of lynx trapped in Canada from 1821-1934
\(x_t = m_t + s_t + e_t\)
We need a way to extract the so-called signal from the noise
One common method is via “linear filters”
Linear filters can be thought of as “smoothing” the data
Linear filters typically take the form
\[ \hat{m}_t = \sum_{i=-\infty}^{\infty} \lambda_i x_{t+1} \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
If \(a = 1\), then
\[ \hat{m}_t = \frac{1}{3}(x_{t-1} + x_t + x_{t+1}) \]
For example, a moving average
\[ \hat{m}_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
As \(a\) increases, the estimated trend becomes more smooth
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Monthly airline passengers from 1949-1960
Once we have an estimate of the trend \(\hat{m}_t\), we can estimate \(\hat{s}_t\) simply by subtraction:
\[ \hat{s}_t = x_t - \hat{m}_t \]
Seasonal effect (\(\hat{s}_t\)), assuming \(\lambda = 1/9\)
But, \(\hat{s}_t\) really includes the remainder \(e_t\) as well
\[ \begin{align} \hat{s}_t &= x_t - \hat{m}_t \\ (s_t + e_t) &= x_t - m_t \end{align} \]
So we need to estimate the mean seasonal effect as
\[ \hat{s}_{Jan} = \sum \frac{1}{(N/12)} \{s_1, s_{13}, s_{25}, \dots \} \\ \hat{s}_{Feb} = \sum \frac{1}{(N/12)} \{s_2, s_{14}, s_{26}, \dots \} \\ \vdots \\ \hat{s}_{Dec} = \sum \frac{1}{(N/12)} \{s_{12}, s_{24}, s_{36}, \dots \} \\ \]
Now we can estimate \(e_t\) via subtraction:
\[ \hat{e}_t = x_t - \hat{m}_t - \hat{s}_t \]
Log-transform data
Linear trend
Monthly airline passengers from 1949-1960
The expectation (\(E\)) of a variable is its mean value in the population
\(\text{E}(x) \equiv\) mean of \(x = \mu\)
We can estimate \(\mu\) from a sample as
\[ m = \frac{1}{N} \sum_{i=1}^N{x_i} \]
\(\text{E}([x - \mu]^2) \equiv\) expected deviations of \(x\) about \(\mu\)
\(\text{E}([x - \mu]^2) \equiv\) variance of \(x = \sigma^2\)
We can estimate \(\sigma^2\) from a sample as
\[ s^2 = \frac{1}{N-1}\sum_{i=1}^N{(x_i - m)^2} \]
If we have two variables, \(x\) and \(y\), we can generalize variance
\[ \sigma^2 = \text{E}([x_i - \mu][x_i - \mu]) \]
into covariance
\[ \gamma_{x,y} = \text{E}([x_i - \mu_x][y_i - \mu_y]) \]
If we have two variables, \(x\) and \(y\), we can generalize variance
\[ \sigma^2 = \text{E}([x_i - \mu][x_i - \mu]) \]
into covariance
\[ \gamma_{x,y} = \text{E}([x_i - \mu_x][y_i - \mu_y]) \]
We can estimate \(\gamma_{x,y}\) from a sample as
\[ \text{Cov}(x,y) = \frac{1}{N-1}\sum_{i=1}^N{(x_i - m_x)(y_i - m_y)} \]
Correlation is a dimensionless measure of the linear association between 2 variables, \(x\) & \(y\)
It is simply the covariance standardized by the standard deviations
\[ \rho_{x,y} = \frac{\gamma_{x,y}}{\sigma_x \sigma_y} \]
\[ -1 < \rho_{x,y} < 1 \]
Correlation is a dimensionless measure of the linear association between 2 variables \(x\) & \(y\)
It is simply the covariance standardized by the standard deviations
\[ \rho_{x,y} = \frac{\gamma_{x,y}}{\sigma_x \sigma_y} \]
We can estimate \(\rho_{x,y}\) from a sample as
\[ \text{Cor}(x,y) = \frac{\text{Cov}(x,y)}{s_x s_y} \]
Consider a single value, \(x_t\)
Consider a single value, \(x_t\)
\(\text{E}(x_t)\) is taken across an ensemble of all possible time series
Our single realization is our estimate!
If \(\text{E}(x_t)\) is constant across time, we say the time series is stationary in the mean
Stationarity is a convenient assumption that allows us to describe the statistical properties of a time series.
In general, a time series is said to be stationary if there is
Our eyes are really bad at identifying stationarity, so we will learn some tools to help us
For stationary ts, we define the autocovariance function (\(\gamma_k\)) as
\[ \gamma_k = \text{E}([x_t - \mu][x_{t+k} - \mu]) \]
which means that
\[ \gamma_0 = \text{E}([x_t - \mu][x_{t} - \mu]) = \sigma^2 \]
For stationary ts, we define the autocovariance function (\(\gamma_k\)) as
\[ \gamma_k = \text{E}([x_t - \mu][x_{t+k} - \mu]) \]
“Smooth” time series have large ACVF for large \(k\)
“Choppy” time series have ACVF near 0 for small \(k\)
For stationary ts, we define the autocovariance function (\(\gamma_k\)) as
\[ \gamma_k = \text{E}([x_t - \mu][x_{t+k} - \mu]) \]
We can estimate \(\gamma_k\) from a sample as
\[ c_k = \frac{1}{N}\sum_{t=1}^{N-k}{(x_t - m)(x_{t+k} - m)} \]
The autocorrelation function (ACF) is simply the ACVF normalized by the variance
\[ \rho_k = \frac{\gamma_k}{\sigma^2} = \frac{\gamma_k}{\gamma_0} \]
The ACF measures the correlation of a time series against a time-shifted version of itself
The autocorrelation function (ACF) is simply the ACVF normalized by the variance
\[ \rho_k = \frac{\gamma_k}{\sigma^2} = \frac{\gamma_k}{\gamma_0} \]
The ACF measures the correlation of a time series against a time-shifted version of itself
We can estimate ACF from a sample as
\[ r_k = \frac{c_k}{c_0} \]
The ACF has several important properties:
Graphical output for the ACF
The ACF at lag = 0 is always 1
Approximate confidence intervals
acf(ts_object)
Recall the transitive property, whereby
If \(A = B\) and \(B = C\), then \(A = C\)
Recall the transitive property, whereby
If \(A = B\) and \(B = C\), then \(A = C\)
which suggests that
If \(x \propto y\) and \(y \propto z\), then \(x \propto z\)
Recall the transitive property, whereby
If \(A = B\) and \(B = C\), then \(A = C\)
which suggests that
If \(x \propto y\) and \(y \propto z\), then \(x \propto z\)
and thus
If \(x_t \propto x_{t+1}\) and \(x_{t+1} \propto x_{t+2}\), then \(x_t \propto x_{t+2}\)
The partial autocorrelation function (\(\phi_k\)) measures the correlation between a series \(x_t\) and \(x_{t+k}\) with the linear dependence of \(\{x_{t-1},x_{t-2},\dots,x_{t-k-1}\}\) removed
The partial autocorrelation function (\(\phi_k\)) measures the correlation between a series \(x_t\) and \(x_{t+k}\) with the linear dependence of \(\{x_{t-1},x_{t-2},\dots,x_{t-k-1}\}\) removed
We can estimate \(\phi_k\) from a sample as
\[ \phi_k = \begin{cases} \text{Cor}(x_1,x_0) = \rho_1 & \text{if } k = 1 \\ \text{Cor}(x_k-x_k^{k-1}, x_0-x_0^{k-1}) & \text{if } k \geq 2 \end{cases} \]
\[ x_k^{k-1} = \beta_1 x_{k-1} + \beta_2 x_{k-2} + \dots + \beta_{k-1} x_1 \]
\[ x_0^{k-1} = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_{k-1} x_{k-1} \]
Autocorrelation
Partial autocorrelation
We will see that the ACF & PACF are very useful for identifying the orders of ARMA models
Often we want to look for relationships between 2 different time series
We can extend the notion of covariance to cross-covariance
Often we want to look for relationships between 2 different time series
We can extend the notion of covariance to cross-covariance
We can estimate the CCVF \((g^{x,y}_k)\) from a sample as
\[ g^{x,y}_k = \frac{1}{N}\sum_{t=1}^{N-k}{(x_t - m_x)(y_{t+k} - m_y)} \]
The cross-correlation function is the CCVF normalized by the standard deviations of x & y
\[ r^{x,y}_k = \frac{g^{x,y}_k}{s_x s_y} \]
Just as with other measures of correlation
\[ -1 \leq r^{x,y}_k \leq 1 \]
ccf(x, y)
Note: the lag k value returned by ccf(x, y) is the correlation between x[t+k] and y[t]
In an explanatory context, we often think of \(y = f(x)\), so it’s helpful to use ccf(y, x) and only consider positive lags